Goto

Collaborating Authors

 lr policy


Boosting Deep Ensembles with Learning Rate Tuning

Jin, Hongpeng, Wu, Yanzhao

arXiv.org Artificial Intelligence

The Learning Rate (LR) has a high impact on deep learning training performance. A common practice is to train a Deep Neural Network (DNN) multiple times with different LR policies to find the optimal LR policy, which has been widely recognized as a daunting and costly task. Moreover, multiple times of DNN training has not been effectively utilized. In practice, often only the optimal LR is adopted, which misses the opportunities to further enhance the overall accuracy of the deep learning system and results in a huge waste of both computing resources and training time. This paper presents a novel framework, LREnsemble, to effectively leverage effective learning rate tuning to boost deep ensemble performance. We make three original contributions. First, we show that the LR tuning with different LR policies can produce highly diverse DNNs, which can be supplied as base models for deep ensembles. Second, we leverage different ensemble selection algorithms to identify high-quality deep ensembles from the large pool of base models with significant accuracy improvements over the best single base model. Third, we propose LREnsemble, a framework that utilizes the synergy of LR tuning and deep ensemble techniques to enhance deep learning performance. The experiments on multiple benchmark datasets have demonstrated the effectiveness of LREnsemble, generating up to 2.34% accuracy improvements over well-optimized baselines.


FastSpiker: Enabling Fast Training for Spiking Neural Networks on Event-based Data through Learning Rate Enhancements for Autonomous Embedded Systems

Bano, Iqra, Putra, Rachmad Vidya Wicaksana, Marchisio, Alberto, Shafique, Muhammad

arXiv.org Artificial Intelligence

Autonomous embedded systems (e.g., robots) typically necessitate intelligent computation with low power/energy processing for completing their tasks. Such requirements can be fulfilled by embodied neuromorphic intelligence with spiking neural networks (SNNs) because of their high learning quality (e.g., accuracy) and sparse computation. Here, the employment of event-based data is preferred to ensure seamless connectivity between input and processing parts. However, state-of-the-art SNNs still face a long training time to achieve high accuracy, thereby incurring high energy consumption and producing a high rate of carbon emission. Toward this, we propose FastSpiker, a novel methodology that enables fast SNN training on event-based data through learning rate enhancements targeting autonomous embedded systems. In FastSpiker, we first investigate the impact of different learning rate policies and their values, then select the ones that quickly offer high accuracy. Afterward, we explore different settings for the selected learning rate policies to find the appropriate policies through a statistical-based decision. Experimental results show that our FastSpiker offers up to 10.5x faster training time and up to 88.39% lower carbon emission to achieve higher or comparable accuracy to the state-of-the-art on the event-based automotive dataset (i.e., NCARS). In this manner, our FastSpiker methodology paves the way for green and sustainable computing in realizing embodied neuromorphic intelligence for autonomous embedded systems.


Rethinking Learning Rate Tuning in the Era of Large Language Models

Jin, Hongpeng, Wei, Wenqi, Wang, Xuyu, Zhang, Wenbin, Wu, Yanzhao

arXiv.org Artificial Intelligence

Large Language Models (LLMs) represent the recent success of deep learning in achieving remarkable human-like predictive performance. It has become a mainstream strategy to leverage fine-tuning to adapt LLMs for various real-world applications due to the prohibitive expenses associated with LLM training. The learning rate is one of the most important hyperparameters in LLM fine-tuning with direct impacts on both fine-tuning efficiency and fine-tuned LLM quality. Existing learning rate policies are primarily designed for training traditional deep neural networks (DNNs), which may not work well for LLM fine-tuning. We reassess the research challenges and opportunities of learning rate tuning in the coming era of Large Language Models. This paper makes three original contributions. First, we revisit existing learning rate policies to analyze the critical challenges of learning rate tuning in the era of LLMs. Second, we present LRBench++ to benchmark learning rate policies and facilitate learning rate tuning for both traditional DNNs and LLMs. Third, our experimental analysis with LRBench++ demonstrates the key differences between LLM fine-tuning and traditional DNN training and validates our analysis.


Selecting and Composing Learning Rate Policies for Deep Neural Networks

Wu, Yanzhao, Liu, Ling

arXiv.org Artificial Intelligence

The choice of learning rate (LR) functions and policies has evolved from a simple fixed LR to the decaying LR and the cyclic LR, aiming to improve the accuracy and reduce the training time of Deep Neural Networks (DNNs). This paper presents a systematic approach to selecting and composing an LR policy for effective DNN training to meet desired target accuracy and reduce training time within the pre-defined training iterations. It makes three original contributions. First, we develop an LR tuning mechanism for auto-verification of a given LR policy with respect to the desired accuracy goal under the pre-defined training time constraint. Second, we develop an LR policy recommendation system (LRBench) to select and compose good LR policies from the same and/or different LR functions through dynamic tuning, and avoid bad choices, for a given learning task, DNN model and dataset. Third, we extend LRBench by supporting different DNN optimizers and show the significant mutual impact of different LR policies and different optimizers. Evaluated using popular benchmark datasets and different DNN models (LeNet, CNN3, ResNet), we show that our approach can effectively deliver high DNN test accuracy, outperform the existing recommended default LR policies, and reduce the DNN training time by 1.6$\sim$6.7$\times$ to meet a targeted model accuracy.


Demystifying Learning Rate Polices for High Accuracy Training of Deep Neural Networks

Wu, Yanzhao, Liu, Ling, Bae, Juhyun, Chow, Ka-Ho, Iyengar, Arun, Pu, Calton, Wei, Wenqi, Yu, Lei, Zhang, Qi

arXiv.org Machine Learning

J. W atson Research, Y orktown Heights, NY, USA Abstract --Learning Rate (LR) is an important hyper-parameter to tune for effective training of deep neural networks (DNNs). Even for the baseline of a constant learning rate, it is nontrivial to choose a good constant value for training a DNN. Dynamic learning rates involve multi-step tuning of LR values at various stages of the training process and offer high accuracy and fast convergence. However, they are much harder to tune. In this paper, we present a comprehensive study of 13 learning rate functions and their associated LR policies by examining their range parameters, step parameters, and value update parameters. We propose a set of metrics for evaluating and selecting LR policies, including the classification confidence, variance, cost, and robustness, and implement them in LRBench, an LR benchmarking system. LRBench can assist end-users and DNN developers to select good LR policies and avoid bad LR policies for training their DNNs. We tested LRBench on Caffe, an open source deep learning framework, to showcase the tuning optimization of LR policies. Evaluated through extensive experiments, we attempt to demystify the tuning of LR policies by identifying good LR policies with effective LR value ranges and step sizes for LR update schedules. I NTRODUCTION Deep neural networks (DNNs) are widely employed to mine Big Data and gain deep insight on Big Data, ranging from image classification, voice recognition, text mining and Natural Language Processing (NLP). One of the most important performance optimization for DNNs is to train a deep learning model capable of achieving high test accuracy.